Re: [-empyre-] Archives, metadata and searching



Dear Simon and Stephen (and everybody else of course):

I would like to explain briefly the approach of InterPARES to metadata.

In the past couple of years, InterPARES has been building a database registering existing metadata schemata and analyzing them according to specific criteria aiming to establish whether a metadata schema is able to provide evidence that a record is as accurate, reliable and authentic as it was when first saved. Thus, our analyses are not concerned with the retrievability of records, other than indirectly, in the sense that, if a schema is able to satisfy our requirements, it will also be a powerful retrieval instrument.

Having looked at several schemata and at several case studies of different types of records creators (in the arts, sciences, and e-gov.) who are either using existing schemata or generating personalized ones, we have arrived at the conclusion that every schema is adequate to any purpose if it allows to identify the record in context and to establish its integrity. In other words, every digital entity should have identity metadata and integrity metadata. The former are the attributes that uniquely identify a record and distinguish it from any other record. For a letter, they would be attributes like names of creator (the person in whose archives the letter is maintained), author (human or organizational person issuing the letter), addressee, writer (the person articulating the discourse), date on the doc. date of transmission, of receipt and archiving, subject matter, filing code, filing codes of previous and subsequent letter, format, attachments, etc. For a telescope observation record, they would be attributes like name of star, inclination of telescope, time of observation, light curve, etc. Every creator should identify what is needed for identification (and therefore retrieval) of its own records. Integrity metadata are data about responsibility for the record and for its changes over time. They include things like name of the person responsible for handling or for keeping the record, changes made to the record, dates and results of updates, upgrades, migrations, etc. The purpose is to demonstrate control on the maintenance process and justify changes. The reason is that, years later, one wants to be able to demonstrate that the entity copyrighted or linked to somebody's intellectual rights 10 years before is the same entity, even if it looks a bit different.

Now, all the metadata indicated above are the responsibility of the creator and chosen by the creator. Once the digital record goes to the preserver, it goes as part of an aggregation of material. The preserver should use metadata schemata representing the identifying attributes and the integrity information of the aggregation, not of its individual components. Linked to the metadata for the aggregation should be all the documentation related to that unit (name of creator, type and scope of material, historical development, how the material was originally used, circumstances of acquisition, internal relationships among its parts, technological characteristics, other related material, how the preserver has upgraded the material to maintain it accessible, consequent changes, etc.....we call this archival description.), and directions on how to retrieve things once one is inside the aggregation. Once the aggregation is retrieved by the user on the basis of the preserver metadata, than the original metadata schemata of the creator are used to get the specific record.

To make a long story short, we do not believe in one size fits all. We believe that metadata schemata should be built according to the same principles, but should be different from creator to creator unless the creators are doing the same things and producing the same records (which is usually true only in government and some types of businesses). We also believe that preservers should not be attaching metadata to records, but to the entire entity that they acquire as a unit, and should not be telling records creators what metadata to use, other than advising them in general on the principles that should guide their choice.

With all the above said, on Feb. 20 all InterPARES archival theorists will get together to sort out the metadata concept on the basis of findings to date, so everything may change...but not by much...I do not think.

Cheers,

Luciana




This archive was generated by a fusion of Pipermail 0.09 (Mailman edition) and MHonArc 2.6.8.